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Abstract 

We present an incremental maintenance algorithm for leapfrog triejoin. The algorithm maintains rules in time proportional 
(modulo log factors) to the edit distance between leapfrog triejoin traces. 
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1 Introduction 

Incremental evaluation is a perennial topic in computer science. The basic problem is easily described: given an ex- 
pensive computation, and some change to its inputs, we want to efficiently update the result, without recomputing 
from scratch. 

The most common traditional approach to such problems is to consider a computation graph, in which edges 
carry values, and vertices represent small computations, inputs or outputs. For example, the arithmetic expression 
r = {a + b) * {c + d) could be represented by this graph: 

r 



(a+b)*(c+d) 




abed 



We can regard the above graph as a trace, i.e., a low-level history of the computation. When an input such as b 
changed, the effect of the change can be rippled through the graph, only re-evaluating those vertices whose inputs 
change. We can call this process traee maintenance, and if done properly, can be done in time cost proportional to 
the number of changes required to update the trace. 

In databases the problem of incremental evaluation is known as incremental view maintenance. A view is simply a 
query installed in a database, and kept up-to-date as its input relations change. 

Incremental view maintenance has historically been done using one of two techniques H): 

1. Syntactic approaches derive special rules to update a view. For example, given a rule such as: 

C{x) < — A{x),B{x). 

which computes the intersection ADB, one can automatically derive rules to update the view when elements 
are inserted to A or B. For example, a rule which says Tf x is inserted to A, and x G B, then insert x into C 
can be written: 

+C{x) i \-A{x),B{x) 

There are two challenges associated with this approach: (a) for complex rules one can encoimter a combina- 
torial explosion of update rules; and (b) the update rules may be difficult to evaluate efficiently. In particular, 
any claim of efficiency for this approach must resort to a deus ex machina appeal to the strength of the query 
optimizer. 

2. Algebraic approaches follow a trace-maintenance approach, at a coarse level of granularity, where each ver- 
tex in the graph represents an algebra operator (e.g., join, projection). One defines special maintenance 
algorithms for each operator, so that e.g. a projection can be maintained efficiently when its input relation is 
updated. 

In this paper we present an incremental maintenance algorithm for leapfrog triejoin [6], a join algorithm with 
worst-case optimality guarantees. This maintenance algorithm is implemented in the Delve runtime engine of our 
commercial Datalog system LogicBlox®. 

Our approach to incremental maintenance is rather different from the usual database approaches, and is loosely 
inspired by the dynamization procedure of Acar et al It hews to the trace maintenance approach, arguably 
the traditional technique in computer science. Unlike the algebraic approaches mentioned above, we maintain 
the trace at a very fine level of granularity — at the level of individual iterator operations in the leapfrog triejoin 
algorithm. Our maintenance algorithm has time cost proportional to trace distance (modulo log factors), giving it 
an optimality guarantee. 
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1.1 Aspiration 



Suppose we have some Catalog rule, for example: 

F{x,y) i — G{x,z),H{y,z),I{x,y,z). 

After evaluating this rule to calculate F{x,y), some transaction(s) are committed that modify G, H, and/or I, and 
we wish to update F to reflect these changes. In many situations it is prohibitively expensive to recalculate F from 
scratch, so we instead aim to efficiently maintain F{x,y) based on the changes made to the predicates. 

The aspiration of our maintenance algorithm is maintenance cost proportional to trace edit distance. Unpacking this a 
bit: 

- By maintenance cost, we mean the number of steps required to maintain a rule, i.e. produce new versions of 
the head predicates in response to changes made to body predicates. 

- By trace, we mean a low-level step-by-step description of the operations performed during full evaluation 
of a rule, i.e., a succinct history of the computation. We maintain traces at the level of predicate iterator 
operations, so these steps might include items such as "position the iterator for G(x, z) at a least upper 
bound for (x = 1531, z = 142)". 

- By trace edit distance, we mean comparing side-by-side the trace for full-evaluation on the original predicates 
(e.g. G, H, I) with the trace for full-evaluation on the modified predicates (e.g. G', H', I'), and counting how 
many changes must be made to the original trace to turn it into the trace for full-evaluation on the modified 
predicates. 

For a trivial illustration, suppose I have a rule 

C[x]=z < — z=A[x]+B[x]. 



If I evaluate this rule in a hypothetical debugging mode where each step is logged, I might get a table like this: 



X 


1 A[x] 


B[x] 


C[x 





1 








1 


1 10 





10 


2 


1 


1 


1 


3 


1 30 


1 


31 


4 


1 








5 


1 









This table is a rough approximation of what we mean by a 'trace': a step-by-step description of the evaluation. 
Now suppose I make some changes to A [x] and do another full evaluation, which produces this table (with differ- 
ences marked by an asterisk): 



X 


1 A[x] 


B[x] 


C[x] 





1 








1 


1 10 





10 


2 


1 20* 


1 


21* 


3 


1 30 


1 


31 


4 


1 








5 


1 50* 





50* 



If I rim the imix command 'diff ' on these two tables, I get: 



< 2 I 1 1 

> 2 I 20 1 21 

< 5 I 

> 5 I 50 50 
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The length of this diff hints at what we mean by 'trace edit distance': the number of changes (edits) you'd need to 
make to the original trace to turn it into the trace of full-evaluation on the modified predicates. 

There is a large history to the general problem of incremental maintenance (not just for Catalog) that says this goal 
is achievable; the challenge is finding a solution that achieves this goal yet performs well. 

1.2 Summary of the maintenance algorithm 

We give a brief sketch of our approach, for orientation. 

First, some notations. Given two versions C, C' of a predicate C{x), we write (C ■ ■ ■ C') for the difference between 
the two versions. We mean by this a 'delta' relation of the form: 

^, f (x, INSERT) if X ^ C and X e C 

^ "' '^ ~ |(x,ERASE) ifxeCandx^C 

i.e. a predicate enumerating the differences between C, C', with the A variable taking on values INSERT and 
ERASE. 

Consider the rule mentioned above: 

F{x,y) i — G{x,z),H{y,z), I{x,y,z). 

Let Body[G, H, I] (x, y, z) = G(x, z), H{y, z), J(x, y, z) be the body of the rule. The basic approach to maintenance is 
to evaluate a rule of the form: 

SF{x,y,A) < — (Body[G,H,7] ■ ■ ■ Body[G',H',7'])(x,y,z, A), 
ChangeOracle(x, y,z). 

- The left-hand side SF{x, y. A) gives a set of changes to be applied to the predicate F to produce the updated 
predicate F'. 

- The term ( Body [G,H,/] ■ ■ ■ Body[G', H', I'])(x,y,z, A) enumerates differences in the satisfying assignments 
of the rule-body for G, H, I vs. G', H', I'. This is done by simply evaluating both bodies and comparing the 
results. 

- The ChangeOracle(x, y, z) term serves to restrict evaluation to just those regions of the tuple-space where 
changes might occur. The maintenance rule would be correct (but inefficient) if ChangeOracle(x, y, z) were 
omitted. 

There are two primary tasks: 

1. How to represent head predicates, and how to update them given deltas. We describe these techniques in 
Section|2l and some specialized data structures and algorithms in Section|3l 

2. How to construct and employ the ChangeOracle predicate. This is described in SectionlH 

1.3 Background, terminology, and notations 

Our variant of Catalog supports both relations such as R{xi, . . . , xj.) and functions such as F[xi, . . . , xj.] = y. We 
refer to fimctions and relations as predicates. A relation or function symbol together with its arguments is an atom. 

A Catalog rule is written in the form head ■<r- body. A rule head contains one or more atoms; a body is a first-order 
formula. Users write rules in a relaxed form without quantifiers, for example: 

S(x,y) ^ A(x,y),B(y,z). 

Internally, rules are represented in the more restricted form of Figure[TJ with explicit quantifiers; for example: 

\/x,y . S(x,y) <— 3z . A{x,y),B{y,z) 
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conj 
dform 



atom 




disj 
negation 



conj;conj[;conj • • • 
!conj 



head 



rule 



Vx . head <— conj 
atom [ , atom ■ ■ ■ ] 



Fig. 1: Internal representation of rules. 



Variables occurring in both head and body are placed in the rule-level universal quantifier block (e.g. x,y); vari- 
ables occurring only in the body are ascribed to the existential quantifier block of the smallest conjunction (conj) 
encompassing their uses (e.g. z). 

A materialized predicate is one whose elements are stored in a data structure. Materialized predicates can be either 
extensional (EBD) or intensional (IDB): 

- An extensional (EDB) predicate is one whose contents can be directly manipulated by transactions that insert 
and remove records. 

- An intensional (IDB) predicate (aka view) is defined by one or more Datalog rules. IDB predicates are main- 
tained incrementally in response to changes made to EDB predicates. 

A primitive is a function or relation that is calculated on demand. For example, the function add[x, y] = z is a 
primitive that computes z = x + y. 

1.3.1 Key- and value-position 

For a materialized predicate atom R{xi, . . ., x^) or F[xi, . . .,xi^] = y, we say the variables xi,..., xj^ appear in key- 
position. (If F is a primitive operation, e.g., add [xi, X2] = y, we do not count it as having key-position appearances 



A variable is a deemed a key if it appears anywhere in key-position in the body of a rule; otherwise it is a value. 
A binding for the key variables of a rule uniquely determines the values. For example, in the expression F[x\ = 
a,G[y\ = b,r = a + b, the variables x, y are keys, and the variables a, h, r are values. 

2 Maintaining head predicates 

In this section we describe how to maintain head predicates as changes are made to satisfying assignments of the 
body. 

2.1 Projection-free rules 

A rule is projection-free if each atom in the head contains an appearance of every key variable. For example: 



of variables.) 



Vx,i/,z . R{x,y,z) A{x,y),B{y,z) 



is projection-free, whereas: 



\lx,y . S{x,y) <— 3z . A{x,y),B{y,z) 
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is not, because the key- variable z does not appear in the head atom S{x,y). 

Maintaining head predicates for projection-free rules is easy: we simply insert or remove records in response to 
the changes made to satisfying assignments of the body. 

2.2 Rules with projection 

For rules with projection we primarily use counting ||5|. Consider the rule: 

S{x,y) ^ A[x,y),B{y,z) 

We represent S by a predicate S [x, y] = rj, where rj \sa support count: the number of satisfying assignments of the 
body producing {x,y). (In the example rule, there might be several bindings of z for a given (x, y).) Then, for 
5S{x,y, A), we respond to a A = INSERT by incrementing rj, and to a A = ERASE by decrementing ?j. We use 
special data structure support (an update-action) that treats // as a reference count, so that a decrement of rj resulting 
ir\rj =Q causes the record to be deleted. 

Functions appearing in rule heads are handled in a similar way: suppose the head predicate is F [s] =f . We maintain 
the head predicate as F[s\ = {t,rj), where rj is the support count. Given a set of deltas to apply, we order them so 
ERASE actions are applied first, to avoid issues with conflicting function values. 

2.2.1 Short-circuit evaluation 

In some cases we can avoid the use of reference counts by using short-circuit evaluation. For a rule such as: 

S{x,y) ^ A{x,y),B{y,z) 
it is helpful to explicitly insert a quantifier for z: 

S{x,y) ^ 3z.A{x,y),B{y,z) 

Suppose the key order chosen by the optimizer is [x, y, z] . Given particular x, y, it is obviously of little use to 
enumerate all possible satisfying assignments for z. We can instead use short-circuit evaluation: as soon the 
first satisfying assignment for an {x,y) is produced, we can backtrack immediately without considering further 
assignments of z. In this case, the support count rj is unnecessary. However, in some cases it might be more 
efficient to use a key order such as [z, y, x], in which case short-circuit evaluation cannot be used. This decision is 
left to the query optimizer. 

2.3 Aggregations 

Our variant of Datalog supports aggregations such as sum, count, min, and max. For example, the following rule 
computes the total calories consumed by people from meals: 

CaloriesConsumed[person] = totcal 
agg <C totcal = sum (cflZ) S> 
ate(perso«, /neflZ),caloriesOf[)7zeflZ] = cal. 

2.3.1 Aggregations: count 

Count aggregations can be handled by using the same mechanism used for support counts of rules with projec- 
tions. For example: 

outdegree[x] = d -(r- 
agg < = count{) > 
£(x,y). 
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can be implemented using a head predicate outdegree[x] = d, where d is a support count as described above. The 
support count is incremented and decremented in response to changes in satisfying assignments of the rule body: 
if a new satisfying assignment E{x,y) is found, then d is incremented; if a satisfying assignment is removed, then 
d is decremented, and the record is removed from outdegree when d = 0. 

2.3.2 Aggregations: the Abelian group case 

For sum aggregations over an Abelian group (G, +, — , 0), where the operator + associative and commutative, we 
can use an update-action that updates the aggregate by 'adding' the new value when A = INSERT, and 'adding' 
the inverse ('negative') of the new value when A = ERASE. We also employ a support count // to remove a record 
once no more satisfying assignments of the body contribute to it. 

This style of aggregation can be used for sum aggregations over integers and fixed-precision data types. 

2.3.3 Aggregations: the semigroup case 

Min and max aggregations cannot be treated as Abelian group aggregations, since there is no inverse: i.e. no 
operation -^^ such that Va . min(a^^,a) = I, where I = — oo is an identity element. We instead treat these as 
aggregations over a semigroup (M, ©), where © is a binary operator (e.g., min, max). 

Consider the example aggregation: 

A[x] = ms 
agg <C ms = max{s) ^ 
D[x,y,z] = s. 

We use an intermediate predicate that supports scans, as described in Section |3Al Each satisfying assignment of 
the body is inserted into this intermediate predicate by a special rule: 

J^niax—scan [^/ ]// 2] ~ S i 

D[x,y,z] = s. 

The head predicate A [x] = ms is computed by performing scans on the Amax-scan predicate, which we can write: 

A[x] = Scan {A„jax-scan, [x, -00, -00], [x, +00, +00]) 

That is, for each x, A [x] is computed by taking a scan of all records in the interval from [x, —00, —00] to [x, +00, +00], 
where — oo, +00 are the smallest /largest representable values of the datatype. 

For each change in satisfying assignments of the body, we insert or remove records to /from the intermediate 
predicate Amax-scan [x, y, z] = s, and then recompute whatever records of A[x] = ms could have changed. This lets 
us maintain the aggregation result in time 0{S log n), where S is the number of changes in satisfying assignments 
of the body. 

Note: we can reuse the intermediate predicate Amax-scan [x, y,z] = s to provide aggregations at multiple levels of 
detail. For example, if we also wanted to know the maximum s for a given x, y pair, we could define a rule: 

A'[x,y] = ms ^r- 
agg <^ms = max{s) ^ 
D[x,y,z] = s. 

which could share the Amax-scan [x, y,z] = s intermediate predicate with the rule calculating A [x] = ms: 

A'[x,y] = Scan [Amax-scan, [x,y, -00], [x,y, +00]) 

String concatenation aggregations can also be handled using the semigroup approach; this can be made efficient 
by representing long strings using ropes |i2J. 
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2.3.4 Aggregations: floating-point sums 



Floating-point sum aggregations are problematic because floating-point addition is not associative. The Abelian 
group approach described above would allow arbitrarily large errors to accumulate over time as the sum was 
maintained. The semigroup approach would produce answers that depended in subtle ways on the precise struc- 
ture of the scan-tree (Section 13.11 1, due to nonassociativity; it would also have the undesirable requirement of 
storing all satisfying assignments of the rule body in an intermediate data structure. 

The sensible alternative is to employ a head-predicate with an arbitrary-precision floating-point value, which lets 
us use the Abelian group approach. That is, for an aggregation such as: 

F[x] = tot ^ 
agg < tot = total (v) > 
G[x,y] = V 

where c is a floating-point value, we use an intermediate predicate of the form F*[x] = {tot*,rj), where tot* is 
represented using an arbitrary-precision type, and use the update technique mentioned in Section l2.3.2l 

There is a useful trick that can be employed here to efficiently represent tot* . Consider a floating-point sum S = 
X^jgj s/. Instead of representing S directly, we can instead represent the sum X + (X^^gj s/), where X G M is a value 
such as: 

512 

X= 2^*^ 

J:=-512 

The binary representation of X is e.g.: 

10001000100010001000 ■ ■ ■ 1000100010001.000100010001000 ■ ■ • 1000100010001 

We partition X into 52-bit segments, this being the number of mantissa bits in an IEEE 754 double-precision floating 
point number. We only store 52-bit segments of X + (I]/g/S/) that differ from the corresponding segment of X, 
representing each segment as a floating-point number. Since X has 1-bits at regular intervals, any borrowing 
required to accommodate a negative summand never requires increasing the Hamming distance between X and 
X -|- (I^/g/S,) by more than 4 bits. (Consider for example representing the sum of S = {2^'^'^, — 1}: with this 
representation we do not have to borrow from 2^"", which would cause a run of 500 I's in the representation; 
instead we just borrow from 2^. We would store only two segments, the one containing 2^*^*^ and the one containing 
2°.) 

To extract F[x] = tot from the intermediate predicate F* [x] = {tot*, rj), we use a rule of the form: 

F[x] = tot ^ 
F*[x] = {tot*,ii), 
tot = toFloat[tor]. 

The primitive toFloat is straightforward to implement: we identify the first bit-position where X -|- [Y^i^j s,) differs 
from X. The value tot is positive if the first differing bit is zero, and negative if the first differing bit is one. We 
subtract X, and extract a 52-bit mantissa. Combined with an exponent and sign, this yields a double-precision 
floating-point quantity. 

3 Algorithms &. Data structures 
3.1 Scans 

Scans (prefix-sums) are a handy formalism for aggregation-like operations |f3l. We employ them for semigroup 
aggregations (Section |2.3.3t , and also for implementing queries on sensitivity indices (Section l4.1b . 

Given an array A = [ai,a2, ■ ■ ■ ,(!„] and an associative operator ®, the scan of A is just flj © fl2 © ' ' ' © • If we 
choose © to be addition, we get the sum; if we choose © to be the max operator, we get the maximum element. 
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Suppose we want to calculate the aggregation over an arbitrary interval, i.e. fl, ffi ■ ■ ■ © aj where 1 < i < j < n. 
(For example, if the A array contained sales values for each day, we might want to aggregate sales over a specific 
month, rather than over all time.) 

Associativity of the © operator permits a simple data structure that can calculate the scan of any interval in 
0(log«) time. Let = a,- © ■ ■ ■ © fly be the aggregation over the elements . . .,aj. We construct a binary 
tree (a scan-tree) with each leaf a single element a,, and each internal node storing the ©-sum of its children: 




fll fl2 fl3 fl4 fl5 fl6 ^7 tig 



For example, the left child of the root is: 

Al4 = Ai2 © A34 

= (fll © fl2) © (fl3 © 04) 

To calculate the aggregation of an arbitrary interval we take the ©-sum of all subtrees contained entirely in the 
interval. For example: 



fll © ■ 


■®fl8 


= Ais 




fll © ■ 


■®fl3 


= A12 


®fl3 


fl2©- 


■®fl8 


= fl2© 


A34 © A58 


fla©- 


• ©fl7 


= A34 


© A56 © fl7 


fl4© ■ 


■ ©fl7 


= fl4© 


^56 © «7 



For any interval fl,, . . . , flj, we never need to sum more than 2 [log2 m] = 0(log n) elements. 

If a value changes, say is changed to flg, we can update the scan-tree by simply recalculating all internal nodes 
on the path from to the root (A55, A58, Aig). This requires only 0(log n) operations. 

For a concrete example, suppose we have a predicate sa\es[region, store]=tot giving the total sales for each store, and 
we wish to maintain the maximum sales of any store in each region: 

vnaxsa\es[region]=maxtot < — agg{maxtot=max{tot)} 

sales[re^foM, store]=tot. 

Shown below is a scan-tree for calculating the max-aggregation of a sales predicate with 16 records. The first three 
columns give the (region,store,tot) records, and the scan-tree is drawn to the right. The records relevant for region 
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2 have been highlighted: 



region store sa\es[region, store] 



I 


1 


1 000 00 


I 


2 


1 SOO 00 


1 

X 




7^00 00 




4 


8000 00 


1 

X 




1 i^OOO 00 




u 




2 


7 


\J \J\J\J .\J\J 


2 


g 


1440.00 


2 


9 


'^■^OO 00 

\J \J\J\J .\j\J 


2 


10 


1245.00 


2 


11 


7024.00 


2 


12 


5510.00 


2 


13 


9000.00 


3 


14 


325.00 ' 


3 


15 


4000.00 ■ 


3 


16 


5300.00 ■ 



1500.00 — 8000.00 — 15000.00 - 15000.00 



■ 8000.00 



15000.00 - 15000.00 



3500.00 



3300.00 



7024.00 



■ 9000.00 




■ 5300.00 



To calculate the maximum sales for region 2, we can just take the max of all subtrees for region 2: 



maxsales[2] = max(2900.00, 3500.00, 7024.00, 9000.00) 
= 9000.00 



3.1.1 Adapting scan-trees for paged data structures 

In practice, we use a Btree-like data structure where leaf and index pages are augmented with scan information. 
Each leaf page is augmented with a ScanTree data structure that uses approximately 5% of the available space. 
It maintains a scan-tree for the records stored on the leaf page. This scan-tree uses a binary tree structure, but 
each scan-tree-leaf might aggregate a dozen or so records. For example, the sales data might be represented on a 



10 



Btree-like leaf page by this scan-tree: 



8000 



(1,1,1000.00) 



15000 



15000 




(1,2,1500.00) 



(2,10,1245.00) 



9000 



(2,13,9000.00) 



(2,11,7024.00) 



(2,12,5510.00) 




5300 



(3,14,325.00) 



(3,15,4000.00) 



(3,16,5300) 



The records being aggregated (shown in small font) are not actually stored in the scan-tree. 

Each leaf in the scan-tree aggregates a variable number of records, so that insertions and deletions can be handled 
efficiently. The tree is occasionally rebalanced to ensure no child aggregates more than twice as many records as its 
sibling. This permits the scan tree to be updated in 0(log B) amortized time (where B is the Btree leaf page record 
capacity) in response to a record insert/ update/ delete. 

For Btree-style index pages, we augment each record with an extra field containing scan information. On a Btree 
index page, records are typically of the form {key^,. . -rkey^^; pageid), where pageid is the page number of a next-level 
leaf or index page. We add an additional scan-related field, so records are of the form {key^,. . ./fey,,; scan, pageid), 
where scan aggregates all records in the suBtree reachable at pageid. In addition, each index page gets a ScanTree 
(taking approx. 5% of the page space) that aggregates the scan elements of the index-page records. This approach 
allows us to calculate the scan of any interval in an arbitrarily large predicate in O(logn) time, where n is the 
number of records. 

3.1.2 EfFicient iteration of complements 

Suppose we have a set S C T, and we wish to iterate the complement T\S. This can be done efficiently using 
representations for S and T that include a scan-tree for a 'count' aggregation. Such a scan tree lets us count the 
number of records in an interval [fci, ^2] in 0(log n) time, where ki, ^2 are keys (or key- tuples). 

To iterate the complement, we can employ the principle that if a given key interval [/ci,A:2] contains the same 
number of records in S and T, then the complement T \ S is empty in that interval. This reduces the cost of 
iterating the complement to O (| T \ S | ■ log n), a useful improvement for sparse complements . (The naive approach 
of iterating T and doing lookups in S would require 0(| r|) time.) 
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[11,100] - [11,100] - [11,100] - 


[11,100] - [11,107] 


[29,47]"^ 


/ / 


[40,42] — [40,82] / 


/ 


[49, 82] / 


/ 


[62,78] — [62,78] — [62,78] 


/ 


[63, 73] ^ 


/ 


[67, 72] — [67, 78] 




[67, 78] 


1 


[72, 87] — [72, 96] — [72, 99] - 


[72, 107] 


[77,96]^ 




[82,94] — [82,99] / 




[83,99] / 




[86,98] — [86,98] - [86, 107] 




[90,93] 




[93, 100] - [93, 107] 




[98, 107] 





Fig. 2: An interval-tree. Nodes whose intervals contain x = 80 are highlighted. 



3.2 Interval trees 

We use scan trees to implement interval trees, used to represent sensitivity indices in our maintenance algorithm 
(Sectionglll. 

A simple interval tree stores a set of intervals I, with each interval of the form [a, h] where a,b E K and K is some 
scalar key type. An interval query finds the set of intervals containing some key x of interest, i.e. 

hitervalQuery(x) = {[a,b] E I : x E [a,b]} 

For example, we might have 

Z = {[2,10], [3,7], [5,15], [6,9]} 
and in response to the query 'What intervals contain 10?' it would produce { [2, 10], [5, 15] }. 

To implement an interval tree, we can use a scan- tree, where each internal node of the scan tree has a pair [a, b] , 
where a is the min of the interval starts, and b is the max of the interval ends. 

Figure |2] shows an example. The records are in the first column (integer intervals), and the scan-tree to the right. 
To find all intervals containing a particular number x, we start at the root and recursively descend to each child, 
backtracking when the scan-interval does not contain x. The nodes whose interval contain x = 80 are highlighted 
above. The result set produced for x = 80 is { [11, 100], [49, 82], [72, 87], [77, 96] }. 

Interval trees produce the set of containing intervals for a value x in time 0( (m + 1) log n), where m is the number 
of matching intervals and n is the total number of intervals. 

To adapt interval trees for paged data structures, we use a Btree augmented for scans, configured for a max-scan 
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on the endpoint of each interval. Since Btree-type data structures are ordered by key, and index pages store the 
least key of their subtrees, Btrees have a built-in min-aggregation for the startpoint of each interval. 

For general sensitivity indices (Section l4.lt , we use predicates with records of the form 

Senslndex{a.i, a.2,..-, ocm, a,b,'yi,. . . , 7^-) 

This is understood to represent an interval of tuples beginning at [ai, a2, . . . , ctm, a] and ending at [ai, a2,---, dm, V\ ■ 
The 7i, ■ • ■ , 7jt contain supplemental information described later. 

3.3 Delta-iterators 

Our current implementation of paged data structures use copy-on-write page-level versioning. This allows us 
to iterate through the difference between two consecutive versions of a predicate in 0((51ogn) time, where 5 is 
the number of changes made between the two versions, and n is the maximum record count of the two versions. 
This is done by iterating through the two versions simultaneously, and skipping any subtrees common to both 
versions 

4 Maintaining rule bodies 

We now describe our maintenance algorithm for rule bodies. Recall the example: 

F{x,y) i — G{x,z),H{y,z), I{x,y,z). 

We wish to maintain F{x,y) given updated versions of the body predicates G', H', V . We evaluate a maintenance 
rule of the form: 

6F{x,y,h) i — (Bocly[G,H,/] ■ ■ ■ Body[G',H',/'])(x,i/,z,A), 
ChangeOracle(x, y, z) . 

where (Body[G, H, J] • • • Body[G', H', J'])(x,i/,z, A) tabulates changes in satisfying assignments of the rule body, 
and ChangeOracle(x,i/,z) restricts evaluation to regions of the {x,y,z) tuple space where changes may occur. 
Roughly speaking, if you assert a new fact, the change-oracle tells you where it could be used; if you retract a 
fact, the oracle tells you where it was used. The use of the change-oracle is crucial to efficiency. 

The ChangeOracle predicate is the essential heart of our maintenance algorithm. During initial full-evaluation of 
the rule for F, we build indices that note how changes to the predicates G, H, 1 might affect evaluation. To produce 
the ChangeOracle predicate, we use the differences between the body predicates (G ■ ■ ■ G'), (H ■ ■ ■ H') and (/■■■!') 
and these indices to produce the ChangeOracle predicate. Doing this efficiently requires some special algorithms 
and data structures described in Section|3l 

Using delta-iterators (Section |33l l, we can efficiently enumerate the changes to the body predicates; let: 

^G(x,z,A) = (G---G')(x,z,A) 
5H{y, z. A) = (H • • • H') (y, z. A) 
61{x, y, z. A) = (J ■ ■ ■ J') (x, y, z. A) 

To construct the change-oracle, we need to determine what portions of the [x,y,z) tuple-space might need to be 
revisited, given the changes 6G, 5H, and SI. For this we use sensitivity indices that record how the rule evaluation 
is sensitive to changes in the body predicates. 

^ A more sophisticated planned data structure, cascading trees, does versioning in a way that minimizes the number of pages altered. With 
cascading trees, the number of pages that must be examined for delta-iteration is O {SB^^ log S), where B is the average leaf-page capacity. In 
practice, this means that e.g. 50 changes, even to widely scattered keys, will usually be concentrated on a single page. 
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4.1 Sensitivity indices 

In our Datalog system, queries are evaluated using the leapfrog triejoin algorithm (LFTJ) (6). Here is an example 
leapfrogjoinfor A(x),B(x),where A = {0,2,4,5,6} and B = {1,2,6,7}: 



seek{l) seek{6) next 




AnB 2 6 

The leapfrog join begins by positioning an iterator at the start of each predicate, then repeatedly applying these 
rules (demonstrated by the arrows in the above diagram): 

- If either iterator is at end, then stop. 

- If both iterators are positioned at the same key, emit this key. Then increment one iterator. 

- Otherwise, take the iterator positioned at the lesser key, and do a seek-lub to the key at which the other 
iterator is positioned. 

We count the trace as the operations performed on the iterators, and their result (e.g. one step in the above woiild 
be 'seek(6) from x=4 to x=6 on iterator A'). 

LFTJ can handle most 3i queries, but at the lowest level they are implemented in terms of trie-iterator operations 
such as open{), up{), next{), and seek_lub{). So, the approach we are about to describe for maintaining A(x), B(x) 
extends naturally to more complex queries. 

We want to know: what changes to A, B might cause changes to the trace? For example, if we inserted B(5), this 
would change the trace, since the seek(4) arrow from x = 2tox = 6inB would change to land on x = 5. However, 
if we inserted B(3), this would not change the trace, because the seek(4) arrow is seeking a least upper bound for 
4; the trace is not sensitive to changes in B at x = 3. 

The rules for trace sensitivity of a unary predicate D(x) are straightforward: 

1. Seeks: 

seek{vs) 

V v' 

If the iterator for predicate D is positioned at key v, and a seek_liib{vs) operation is performed so the iterator 
is then positioned at v', then the trace is sensitive to changes in D in the interval [vg, v'] . (It is not sensitive to 
changes in {v,Vs), because the seek operation finds a least upper bound for Vs-) 

2. Increment: 

next 

V v' 

If the iterator for D is positioned at key v, and an increment (next) is performed so the iterator is then posi- 
tioned at v' , then the trace is sensitive to changes to D in the interval [v, v'\ . 

3. If the iterator for predicate D is opened at position v (i.e. the first record is v), then the trace is sensitive to 
changes in D in the interval {—co,v\. 

For the above A(x), B(x) example, the sensitivities are: 

Asens = { [-00, 0], [1, 2], [2, 4], [6, 6], [6, +oo] } 
Bs.„s = {[-00,1], [2,2], [4,6]} 
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Given changes dA, SB, we collect intervals where the predicate is sensitive and a change has occurred there: 



Aco{[xi,X2\) i SA{X,A),X e [xi,X2\,Asens{[xi,X2]). 

Bco{[xi, X2]) i SB{X,A),X E [xi,X2\,Bse„s{[xi,X2\). 

To evaluate the above rules for Aco, Bco efficiently, we use an interval-tree representation for Asens and Bgens (Sec- 
tion [ 



We can then define the change-oracle as: 

ChangeOracle(x) = xE [xi,X2],{Acoi[xi,X2]);Bcoi[xi,X2])) ■ 

(In our notation, the semicolon indicates a disjunction.) For efficiency, we treat ChangeOracle as a nonmaterialized 
predicate: during evaluation of the maintenance rule, the iterator for ChangeOracle(x) is internally manipulating 
the Aco and Bco predicates to present the contents of ChangeOracle(x), without explicitly expanding the intervals 
into individual elements. 

A note on the maintenance cycle: after each matching interval in Agens is found, we remove it from Ascns) this 
guarantees that the cost of evaluating the Aco rule is O ((|<5A| -|- |Aco|) logn), i.e. proportional to the number of 
changes and Aco-results. The logn factor reflects the btree heights; a sharper estimate would be to take n = 
max(|^A|, |Ase„s|). When the maintenance rule is evaluated, we accumulate new sensitivity intervals to Asens and 
Bsensf SO we are ready for the next round of maintenance. 



4.1.1 Example of maintenance for a unary Join 

For a concrete example, suppose we have: 

M = {(5, ERASE), (8, INSERT)} 
SB = {(2, ERASE), (3, INSERT)} 



This diagram shows the changes made to A and B, and the sensitivity intervals: 

A 2 4 ^6 +8 



end 



Ase 



0,0] [1,2] [2,4] 

1 ^ +3 



[6,6] 
6 



[6,+m] 



end 



-00,1 



[2,21 



[4,6] 



When we evaluate the rules for Aco, Bco, we find these contributions: 

change contributions to Acq, Be 
A: (5, ERASE) 
A: (8, INSERT) {[6, end]} 

B: (2, ERASE) {[2,2]} 
B: (3, INSERT) 



and so 



ChangeOracle(x) = (J {[2,2], [6, +00]} 
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Maintenance: because of the ChangeOracle, we skip immediately to x = 2; there we find that 2 is no longer in 
ACiB. Then we skip to the start of the next interval in the ChangeOracle, x E [6, +00]: 



ChangeOracle 

A' 
B' 

^(AnB) 



set'/c(2) 



see/c(3) 




[6,+oo] 



next 



4 ^^6' 

seek{6) 



seek{S) 



■6-^ 7 



—I 
end 
end 



During evaluation of the maintenance rule, the sensitivity intervals are updated, so we are ready for the next round 
of maintenance: the intervals we examined because of the ChangeOracle are removed, and we insert new ones 
due to iterator operations as we evaluate the rule. The revised sensitivity intervals are: 
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end 



■Aseiis 

B 



0,0] 



end 



-00,1] 



[2,3] 



4,6] 



[6,6] 



[8,+oo] 



4.2 Sensitivity indices for predicates with multiple arguments 

For a trivial query like A{x), B{x), sensitivity indices & the change-oracle are of marginal use; in fact our system 
would not use them for such a simple query. However, for complex queries these techniques make a tremendous 
difference. 



Consider this example: 



F{x,y) < — G{x,z),H{y,z),I{x,y,z),R{z). 



Suppose the optimizer chooses the key-variable ordering [x, y, 2] . If a fact is retracted from K(z), the change-oracle 
lets us examine only those (x, y,z) tuples where that fact was used. 

The approach to building the change-oracle for predicates with multiple key arguments generalizes that described 
in the previous section. First, a bit of background. 

Recall that LFTJ evaluates rules using a 'backtracking search through tuple space,' which conceptually consists of 
nested leapfrog joins on unary predicates. We write G(x,_) for projection, and Gx{y) for a curried version of G for 
a specific x. For instance, given G = {(0, 10), (0,20), (1,30)}, we would have: 

G(x,_) = {0,1} 
Go(i/) = {10,20} 
Gi(y) = {30} 

(Note: we do not explicitly construct these projections and curried versions; this is just for exposition.) 

The LFTJ algorithm does a backtracking search through the [x, y, z] space, first seeking a binding for x, then pro- 
ceeding to a binding for y once x is found, etc. Conceptually, the three nested queries used are: 



16 



1. G{x,_),I{x,_,_) 



2. H{y,_),h{y,_) 

3. G:c{z),ny{z),hy{z),R{z) 

When we evaluate the query, we record sensitivity information much as described earlier for unary predicates. 
However, we also record information about the bindings of other key-variables. The sensitivity predicate for R, 
for instance would have the form Rsens,z([z\,Z2\, x,y). If a fact is removed from R, we can quickly determine the 
(x, y,z) bindings where that fact was used; if a fact is added, we can quickly determine where it could be used. 

The sensitivity predicate for Hy{z) illustrates the general form: 

HsensA y , [zi,Z2],^X_^) 

(1) (2) (3) 

In position (1) we have variables that precede z in the argument list for H; in position (2) we have the sensitivity 
interval for z; in position (3) we have key-variables that are bound before z but do not appear in the argument-list 
for H. 

So, conceptually, we would have these sensitivity predicates: 

^sens.x ([X1,X2]) 

(x, [zi,Z2\,y) 

HsensAV' [Z1,Z2],X) 

Isens,y{x, [yvyi]) 
hens,z{x,y, [zi,Z2]) 
Rsens,z{[z\,Z2],X,y) 

In practice we can drop sensitivity indices where the key-arguments of the atom form a prefix of the key-ordering 
chosen by the optimizer. For example, our implementation would not bother creating sensitivity indices for J, 
since its arguments match the chosen key order [x,y,z\; it would also not create the index Gsens,i([^i'^2])/ since 
(x) is also a prefix of [x, y, z] . 



4.2.1 Tree surgery operations 

The delta-iterator described in Section l331 1ets us efficiently enumerate the changed records between two consecu- 
tive versions of a predicate. For building the change oracle, we need finer information, namely, changes made to 
the trie presentation of the predicate. We call such changes tree surgery operations. Tree surgery operations consist 
of either inserting or removing branches. 

For example, consider these two versions of a predicate A{x,y,z): 



Version 1 


Version 2 


(0,30,80) 


(0,30,80) 


(0,30,81) 




(1,35,60) 


(1,35,60) 


(1,35,61) 


(1,35,61) 


(3,40,90) 




(3,50,91) 


(3,50,91) 


(3,50,92) 






(4,60,71) 



The delta-iterator would produce this stream of changes: 
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The trie presentations of the two versions are: 




80 81 60 61 90 91 92 
The tree surgery operations would be: 



ERASE 0,30,81 

ERASE 3,40,90 

ERASE 3,50,92 

INSERT 4,60,71 




80 60 



ERASE 

ERASE 

ERASE 

ERASE 

INSERT 

INSERT 

INSERT 



61 91 



0-30-81 
3-40-90 
3-40 

3- 50-92 
4 

4- 60 
4-60-71 



It is reasonably straightforward and efficient to adapt a delta-iterator into an iterator of tree-surgery operations, 
with a little bookkeeping; we omit the details here. 



4.2.2 Matching tree surgery operations with sensitivity indices 



Returning to our running example, recall that we have these two sensitivity indices for H{y,z): 

Hse„s,y( [1/1,1/2],^) 
HsensAV' [Z1,Z2\,X) 

We use a tree-surgery adaptor to get the changes made to the trie presentation of H, from the delta-iterator giving 
us the changes in H. Tree surgeries on H come in two forms: those that insert or remove vertices at depth 1, 
and those that insert or remove vertices at depth 2. We collect these surgeries by depth, writing 3H^{y,A) and 
5H'^{y,z, A) for depth-1 and depth-2 surgeries, respectively. 

Trie surgery operations of depth 1 are matched with intervals in Hse„s,y( [1/1,1/2]/^)/ and trie surgery operations of 
depth2 are matched to intervals in Hsens,z{yi [z\fZ2\,x). The resulting change-oracle contributions we call Hco,y and 
Wco,z/ and are defined by: 



lico,y{x, [yiAJl]) 
Hco,z{x,}/, [Z1,Z2]) 



SH^{y,Z,A),Z E [zi,Z2],Hs™s,z([l/, [Zi,22],x) 



As mentioned previously, this is implemented with interval trees and is very efficient — proportional (modulo 
log n) to the number of tree-surgery operations plus the number of matches to those operations in the sensitivity 
indices (Section l3.2ll . (Also recall that we remove matched intervals from the sensitivity indices.) 

We define contributions to the change oracle from G, I, and R similarly. Finally, we define the change oracle by 
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these (nonmaterialized) definitions: 



ChangeOracle(x, y, z) < — ChangeOraclej (x); ChangeOracle2(x, y); ChangeOracle3(x,y, z). 

ChangeOraclei(x) < — {Gco,x{[xi,X2\);Ico,x{[xi,X2\)),x e [xi,X2\. 

ChangeOracle2(x,y) < — (Hco,y(x, [yi,y2]); [yi,y2])),i/ e [yi,y2]- 

ChangeOracle3(x,y,z) < — ( Gco,z(x,y, [zi,Z2]);Hco,z(x,y, [zi,Z2]); 

Ico,z{x,y,[zi,Z2\);Rco,z{x,yr[zi,Z2\)),z e [zi,Z2]. 

We can then maintain the rule, using the maintenance rule: 

3F{x,y,A) i — (Body[G,H,7] ■ ■ ■ Body [G',H', I']) {x,y,z, A), 
ChangeOracle(x, y,z). 

Recall that when evaluating the maintenance rule, we accumulate new intervals to the sensitivity indices, so we 
are ready for the next round of maintenance. 
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